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WHAT IS CLAIMED IS: 

1. A method for detecting navigation bars in a document, 
the method comprising: 

a) segmenting the document into components; and 

b) for each of the components, determining whether or 
not the component is anchor-heavy, wherein if the 
component is anchor-heavy, it is determined to be a 
navigation bar. 

2. The method of claim 1 wherein the act of determining 
whether or not the component is anchor-heavy is based on a 
number of anchors in the component and a number of 
non-anchor words in the component. 

3. The method of claim 1 wherein the act of determining 
whether or not the component is anchor-heavy includes 

i) determining a number of anchors in the 
component, 

ii) determining a number of non-anchor words in 
the component, and 

iii) if the number of anchors is greater than a 
predetermined threshold and if the number of 
anchors is greater than the number of non-anchor 
words, then determining that the component is 
anchor-heavy. 

4. The method of claim 3 wherein the predetermined 
threshold is about three. 

5. The method of claim 3 wherein the predetermined 
threshold is three. 
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6. The method of claim 1 wherein the act of determining 
whether or not the component is anchor-heavy includes 

i) determining a first count to be a number of 
anchors in the component, 

ii) determining a second count to be a number of 
non-anchor words in the component, 

iii) incrementing the second count by the number 
of words in an anchor having more words than a 
predetermined threshold to determine a non-anchor 
word count, and 

iv) if the first count is greater than a second 
predetermined threshold and if the first count is 
greater than the non-anchor word count, then 
determining that the component is anchor-heavy. 

7. The method of claim 6 wherein the predetermined 
threshold is about four. 

8. The method of claim 6 wherein the predetermined 
threshold is four. 

9. The method of claim 1 wherein the act of segmenting the 
document into components includes generating a parse tree 
based on the document, wherein a first node corresponding 
to a first component is a child of a second node of a 
second component if the first component is included in the 
second component. 

10. The method of claim 9 wherein the act of determining 
whether or not the component is anchor-heavy is based on 
(i) a number of anchors in a node corresponding to the 
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component and all descendant nodes of the node, and (ii) a 
number of non-anchor words in the node corresponding to the 
component and all the descendant nodes of the node. 

11. The method of claim 9 wherein the act of determining 
whether or not the component is anchor-heavy includes 

i) determining a number of anchors in a node 
corresponding to the component and all descendant 
nodes of the node, 

ii) determining a number of non-anchor words in 
the node corresponding to the component and all 
the descendant nodes of the node, and 

iii) if the number of anchors is greater than a 
predetermined threshold and if the number of 
anchors is greater than the number of non-anchor 
words, then determining that the component is 
anchor-heavy. 

12. The method of claim 11 wherein the predetermined 
threshold is about three. 

13. The method of claim 11 wherein the predetermined 
threshold is three. 

14. The method of claim 9 wherein the act of determining 
whether or not the component is anchor-heavy includes 

i) determining a first count to be a number of 
anchors in a node corresponding to the component 
and all descendant nodes of the node, 

ii) determining a second count to be a number of 
non-anchor words in a node corresponding to the 
component and all descendant nodes of the node, 
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iii) incrementing the second count by the numbe 
of words in an anchor having more words than a 
predetermined threshold to determine a non-ancho 
word count, and 

iv) if the first count is greater than a second 
predetermined threshold and if the first count i 
greater than the non-anchor word count, then 
determining that the component is anchor-heavy. 

15. A method for detecting objectionable navigation bars 
in a document, the method comprising: 

a) segmenting the document into components; 

b) for each of the components, determining whether o 
not the component is a navigation bar; and 

c) for each of the components that is determined to 
be a navigation bar, determining whether or not the 
navigation bar is disqualified from being classified 
as an objectionable navigation bar. 

16. The method of claim 15 wherein the act of determining 
for each of the components, whether or not the component i 
a navigation bar is based on a number of anchors in the 
component and a number of non-anchor words in the 
component . 

17. The method of claim 15 wherein the act of determining 
whether or not the component is a navigation bar includes 

i) determining a number of anchors in the 
component, 

ii) determining a number of non-anchor words in 
the component, and 
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7 iii) if the number of anchors is greater than a 

8 predetermined threshold and if the number of 

9 anchors is greater than the number of non-anchor 

10 words, then determining that the component is a 

11 navigation bar. 

1 18. The method of claim 15 wherein the act, for each of 

2 the components that is determined to be a navigation bar, 

3 of determining whether or not the navigation bar is 

4 disqualified from being classified as an objectionable 

5 navigation bar includes determining whether a 

6 disqualification condition, selected from a group of 
□ 7 disqualification conditions consisting of (a) if the 

8 component has less than a predetermined number of anchors, 

fy 9 (b) if the component has more than a predetermined 

py 10 percentage of words of the document, and (c) if the 

^ 11 component is an element of a disqualified component and 

s y 

s 12 that disqualified component has only navigation bar 

[~ 13 elements, exists. 

u 

f*j 1 19. The method of claim 16 wherein the act, for each of 

^ 2 the components that is determined to be a navigation bar, 

3 of determining whether or not the navigation bar is 

4 disqualified from being classified as an objectionable 

5 navigation bar includes determining whether a 

6 disqualification condition, selected from a group of 

7 disqualification conditions consisting of (a) if the 

8 component has less than a predetermined number of anchors, 

9 (b) if the component has more than a predetermined 

10 percentage of words of the document, and (c) if the 

11 component is an element of a disqualified component and 
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12 that disqualified component has only navigation bar 

13 elements, exists. 

1 20. The method of claim 17 wherein the act, for each of 

2 the components that is determined to be a navigation bar, 

3 of determining whether or not the navigation bar is 

4 disqualified from being classified as an objectionable 

5 navigation bar includes determining whether a 

6 disqualification condition, selected from a group of 

7 disqualification conditions consisting of (a) if the 

8 component has less than a predetermined number of anchors, 

9 (b) if the component has more than a predetermined 
□ 10 percentage of words of the document, and (c) if the 

™ 11 component is an element of a disqualified component and 

"•2J 

fU 12 that disqualified component has only navigation bar 

r y 13 elements, exists. 

La 

3 

ry 

s 1 21. A method for detecting objectionable navigation bars 

;. 2 ma document, the method comprising: 

s 

Ly 3 a) segmenting the document into components by 

P 

^ 4 generating a parse tree based on the document, wherein 
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5 a first node corresponding to a first component is a 

6 child of a second node of a second component if the 

7 first component is included in the second component; 

8 b) for each of the nodes of the parse tree, 

9 determining whether or not the node corresponds to a 

10 navigation bar component; and 

11 c) for each of the nodes that is determined to 

12 correspond to a navigation bar, determining whether or 

13 not the navigation bar is disqualified from being 

14 classified as an objectionable navigation bar. 
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1 22. The method of claim 21 wherein the act, for each of 

2 the nodes that is determined to correspond to a navigation 

3 bar, of determining whether or not the navigation bar is 

4 disqualified from being classified as an objectionable 

5 navigation bar includes determining whether a 

6 disqualification condition, selected from a group of 

7 disqualification conditions consisting of (a) if the 

8 component associated with the node has less than a 

9 predetermined number of anchors, (b) if the component 

10 associated with the node has more than a predetermined 

11 percentage of words of the document, and (c) if the node 

12 has a disqualified ancestor node and that all descendant 

p 13 nodes of the disqualified ancestor node are associated with 

rj 14 navigation bar components, exists. 

ru 

fy 1 23. A machine-readable medium having machine executable 

2 instructions thereon, wherein when the machine executable 

ru 

s 3 instructions are executed on a machine, the machine: 
I 4 a) segments the document into components; and 

yj 5 b) for each of the components, determines whether or 

□ 

□ 6 not the component is anchor-heavy, wherein if the 

u 7 component is anchor-heavy, it is determined to be a 

8 navigation bar. 

1 24. A machine-readable medium having machine executable 

2 instructions thereon, wherein when the machine executable 

3 instructions are executed on a machine, the machine: 

4 a) segments the document into * components; 

5 b) for each of the components, determines whether or 

6 not the component is a navigation bar; and 

7 c) for each of the components that is determined to 

8 be a navigation bar, determines whether or not the 
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navigation bar is disqualified from being classified 
as an objectionable navigation bar. 

25. An apparatus for detecting navigation bars in a 
document, the apparatus comprising: 

a) means for segmenting the document into components 
and 

b) means for determining, for each of the components 
whether or not the component is anchor-heavy, wherein 
if the component is anchor-heavy, it is determined to 
be a navigation bar. 

26. An apparatus for detecting objectionable navigation 
bars in a document, the apparatus comprising: 

a) means for segmenting the document into components 

b) means for determining, for each of the components 
whether or not the component is a navigation bar; and 

c) means for determining, for each of the components 
that is determined to be a navigation bar, whether or 
not the navigation bar is disqualified from being 
classified as an objectionable navigation bar. 
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